Multistage Masking Methods for Microdata Protection
نویسندگان
چکیده
Publication of statistical information by national agencies in the form of microdata (i.e., individual records) raises the problem of preventing disclosure of confidential information about particular respondents without significantly damaging the utility of the data being protected. Often, statistical agencies disseminate information only in the form of tables. But, microdata—records which contain information about individuals or establishments—offer far greater flexibility for statistical research, especially of an exploratory nature, than tables. As a result, there has been an increasing demand from users for such data, and agencies would like to be able to comply this demand, provided that confidentiality is not compromised. In particular, there is a well-recognized need to prevent both identity and attribute disclosure. Before releasing microdata, a statistical office deletes from the data direct identifiers, such as names and addresses; however, the risk of identification still exists, for example, by means of linkage of existing databases, available to the users, to the released data. So, in addition, released microdata are typically perturbed, in order to make disclosure more difficult. For such a purpose, a number of Statistical Disclosure Control (SDC) techniques have been developed. Following are the most widely used SDC methods for microdata: • Rankswapping: First, values of variable Vi are ranked in ascending order. Then, each ranked value of Vi is swapped with another ranked value randomly chosen within a restricted range; e.g., the rank of two swapped values cannot differ by more then p percent of the total number of records. • Microaggregation: Records are clustered into small aggregates or groups of size at least k. Rather than publishing an original variable Vi for a given record, the average of the values of Vi over the group to which the record belongs is published. Classical microaggregation,
منابع مشابه
Microdata Protection
Governmental, public, and private organizations are more and more frequently required to make data available for external release in a selective and secure fashion. Most data are today released in the form of microdata, reporting information on individual respondents. The protection of microdata against improper disclosure is therefore an issue that has become increasingly important and will co...
متن کاملOutlier Protection in Continuous Microdata Masking
Masking methods protect data sets against disclosure by perturbing the original values before publication. Masking causes some information loss (masked data are not exactly the same as original data) and does not completely suppress the risk of disclosure for the individuals behind the data set. Information loss can be measured by observing the differences between original and masked data while...
متن کاملReleasing Microdata: Disclosure Risk Estimation, Data Masking and Assessing Utility
Statistical agencies release sample microdata from social surveys under different modes of access ranging from Public Use Files (PUF) in the form of tables or highly perturbed datasets to Microdata Under Contract (MUC) for researchers and licensed institutions where levels of protection are less severe. In addition, statistical agencies often have on-site datalabs where registered researchers c...
متن کاملFuzzy Microaggregation for Microdata Protection
In this work we describe a microdata protection method based on the use of fuzzy clustering and, more specifically, using fuzzy c-means. Microaggregation is a well-known masking method for microdata protection used by National Statistical Offices. Given a set of objects described in terms of a set of variables, this method consists on building a partition of the objects and then replace the ori...
متن کاملDisclosure risk assessment in statistical microdata protection via advanced record linkage
The performance of Statistical Disclosure Control (SDC) methods for microdata (also called masking methods) is measured in terms of the utility and the disclosure risk associated to the protected microdata set. Empirical disclosure risk assessment based on record linkage stands out as a realistic and practical disclosure risk assessment methodology which is applicable to every conceivable maski...
متن کامل